Cross-language transfer of semantic annotation via targeted crowdsourcing: task design and evaluation
نویسندگان
چکیده
The development of a natural language speech application requires the process of semantic annotation. Moreover multilingual porting of speech applications increases the cost and complexity of the annotation task. In this paper we address the problem of transferring the semantic annotation of the source language corpus to a low-resource target language via crowdsourcing. The current crowdsourcing approach faces several problems. First, the available crowdsourcing platforms have skewed distribution of language speakers. Second, speech applications require domain-specific knowledge. Third, the lack of reference target language annotation, makes crowdsourcing worker control very difficult. In this paper we address these issues on the task of cross-language transfer of domain-specific semantic annotation from an Italian spoken language corpus to Greek, via targeted crowdsourcing. The issue of domain knowledge transfer is addressed by priming the workers with the source language concepts. The lack of reference annotation is coped with a consensus-based annotation algorithm. The quality of annotation transfer is assessed using source language references and inter-annotator agreement. We demonstrate that the proposed computational methodology is viable and achieves acceptable annotation quality.
منابع مشابه
Selection and aggregation techniques for crowdsourced semantic annotation task
Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks like semantic annotation transfer require workers to take simultaneous decisions on chunk segmentation and labeling while acquiring on-the-go domainspecific knowledge. The incre...
متن کاملCrowdsourcing Disagreement for Collecting Semantic Annotation
This paper proposes an approach to gathering semantic annotation, which rejects the notion that human interpretation can have a single ground truth, and is instead based on the observation that disagreement between annotators can signal ambiguity in the input text, as well as how the annotation task has been designed. The purpose of this research is to investigate whether disagreement-aware cro...
متن کاملCROWD-IN-THE-LOOP: A Hybrid Approach for Annotating Semantic Roles
Crowdsourcing has proven to be an effective method for generating labeled data for a range of NLP tasks. However, multiple recent attempts of using crowdsourcing to generate gold-labeled training data for semantic role labeling (SRL) reported only modest results, indicating that SRL is perhaps too difficult a task to be effectively crowdsourced. In this paper, we postulate that while producing ...
متن کاملSemantic Network-driven News Recommender Systems: a Celebrity Gossip Use Case
Information overload on the Internet motivates the need for filtering tools. Recommender systems play a significant role in such a scenario, as they provide automatically generated suggestions. In this paper, we propose a novel recommendation approach, based on semantic networks exploration. Given a set of celebrity gossip news articles, our systems leverage both natural language processing tex...
متن کاملCrowdsourcing Annotation for Machine Learning in Natural Language Processing Tasks
Human annotators are critical for creating the necessary datasets to train statistical learners, but annotation cost and limited access to qualified annotators forms a data bottleneck. In recent years, researchers have investigated overcoming this obstacle using crowdsourcing, which is the delegation of a particular task to a large group of untrained individuals rather than a select trained few...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Language Resources and Evaluation
دوره 52 شماره
صفحات -
تاریخ انتشار 2014